This is a tutorial for the Nextstrain Workshop hosted by the Bahl Lab at UGA on 20 June 2024. In this document, you will find step-by-step instructions for analyzing pathogen genomic data using Nextstrain and creating presentations using Nextstrain Narratives.
You should have already a UGA account (MyID) with multi-factor authentification set up via the Duo Mobile App (archpass).
Additionally, you will need a GitHub Account (GitHub Onboarding), which is necessary for sharing results from your analyses in the Nextstrain Narratives.
We will be using several software packages within a high-performance computing environment made available by the Georgia Advanced Computing Resource Center (GACRC). Fortunately, GACRC has installed the necessary software for us. These include Nextstrain for phylogenetics analysis pipelines and Singularity for container runtimes. Although these software have been installed for us, we will still need to do some basic set up before we can begin our analyses. This will be described in detail within this tutorial.
For our training today, the data have already been compiled and cleaned. However, here are some resources from which you may curate your own datasets in the future:
We will begin by working through an example provided by Nextstrain which runs a Zika virus phylogenetic workflow.
Find and open the X Desktop Session under “Interactive Apps” on
Sapelo2. You can either select it from the drop-down menu at the top of
the webpage or by scrolling down a bit on the home page.
Specify resources:
This will send your request for resources to a queue:
After a few moments, the X desktop session should be ready.
Note: you may be prompted to allow access to the clipboard. Click allow as this will make it easier to copy & paste commands from your local clipboard into the X Desktop environment.
Open up a terminal by clicking the icon at the bottom of the
screen.
Load the Nextstrain module with the following command:
ml Nextstrain-CLI/20240604
You may either copy & paste into the command line or type it out yourself.
nextstrain setup --set-default singularity
Note: this may take a few minutes to complete. Please, be patient :) The command line will remain blank while the command is processing.
nextstrain shell .
augur --help
auspice --help
When nextstrain CLI is loaded successfully, you will see the rainbow icon.
If the set up was successfull, you may exit the nextstrain CLI by using the exit command:
exit
Once closed, you will not see the rainbow icon.
If you are unable to copy & paste into the X Desktop session but wish to do so, you may leave the X Desktop session open and return to the browser tab for GACRC OnDemand and click the shell icon to open a new tab.
Note: whichever option you choose to access command line, you should
have the same working directory (/home/{Your MyID}). You can check your
working directory using the pwd command and print its
contents using dir.
ml Nextstrain-CLI/20240604
nextstrain_training with:mkdir nextstrain_training
cd nextstrain_training
git clone https://github.com/nextstrain/zika-tutorial
nextstrain build --cpus 1 zika-tutorial/
Note: this may take a couple of minutes. Please, be patient :)
The path to Firefox is: /apps/eb/Firefox/120.0.1.
Click the app icon to open.
Navigate to the browser settings and make Firefox the default
browser. You may close the browser once you’ve
set the default.
cd ..
Note: the important part is that the directory path specified in the next command matches the location of your results based on your present working directory. You may adjust the directory path in the next command if you’d rather.
nextstrain view nextstrain_training/zika-tutorial/auspice/
When you are finished viewing your results, you can close Firefox and terminal.
Nextstrain offers an “Advanced Data Visualization Platform” that allows researchers to tell data-driven stories about viruses and other pathogens. It uses a special type of text file (Markdown) to combine clear explanations (on the left) with visualizations (like family trees and maps) on the right. This flexibility lets scientists tailor the information for any audience, from experts to decision-makers. The platform is particularly useful for quickly analyzing recent genetic data and then sharing those findings in an interactive and informative way.
This folder contains two key files you’ll need: “zika.json” and “zika_root-sequence.json”.
Note: you can verify the structure of your .json file. The Auspice website can help you confirm the validity of your generated JSON file to ensure the file is structured correctly. Additionally, it will allow you to preview the visualizations you requested during the analysis. This lets you verify if the visualizations were created as intended and accurately represent your data.
It offers a template where you can upload your downloaded JSON files to build your narrative.
This ensures compatibility with the downloaded JSON files (“zika.json” and “zika_root-sequence.json”). Any deviation in the repository name (including capitalization) will prevent access to these files, hindering your ability to publish the phylogenetic tree and build your narratives.
Once you created you repository you can see that you will have 4 items there, 2 are folders and 2 are files.
Next you see that you can add a new file, but we need a new folder. Making a new folder is very similar to making a new file, you just add one “/” right after you wrote the word “auspice”.
Make sure to commit your changes.
Then you want to click on “Add file” and then “upload files”. In this step, you want to upload the 2 files you downloaded from the auspice folder in OnDemand. Those two files are “zika.json” and “zika_root-sequence.json”. Please commit the changes.
This is how your repository should look like now:
This step lets you access your Nextstrain analysis through a web link. The link follows a specific format: https://nextstrain.org/community/{YourGitHubUsername}/zika
For example, my username is “taneenak,” then my link would be: https://nextstrain.org/community/taneenak/zika
This link will allow you to view and interact with your Zika analysis on the Nextstrain platform. This should look similar to the viewer we used with Firefox earlier.
Now, let’s shift our focus to the “narratives” folder.
Once you’ve located the file within the “narratives” folder, you can edit it to create your narrative. This typically involves opening the file in a suitable text editor and modifying its name and content.
This is how I named my file: zika_20240620.md
It is important that the naming of the file starts with the name of the repository (zika in this case), then you must add “_” and specify the name further.
The Markdown File: The narrative you’ll be crafting is written in Markdown, a user-friendly language invented by John Gruber in 2004. Markdown simplifies formatting text and effortlessly translates it into HTML (and even other formats!). This means you can incorporate images, links, tables, and more into your narrative, making it visually engaging and informative.
One of Markdown’s strengths is its ease of adding links. To include a link, simply enclose the text you want displayed in square brackets [] followed by the website address in parentheses (). This will prove useful as you edit your narrative file.
Your narrative file will consist of two main sections: the title and the slides.
The title is separated by two sets of three hyphens (— one on top and one on the bottom of this section), creating a clear distinction. The file should already contain the title by default.
Each of the slides represents a specific result you want to include in your narrative. These slides may contain text, formatted using Markdown syntax and embedded elements like images or links to enhance your explanations.
As mentioned earlier, the title section of your Markdown file is enclosed within two hyphens (— on either side). But why is the title important?
The title serves two key functions:
First Slide: It becomes the content of your very first slide within the narrative.
Identification: It provides a clear label for your narrative, making it easy to identify within the Nextstrain platform or when shared with others.
Edit your title in the markdown file
Yours may look something like this:
---
title: "Zika Virus Phylogenetic Analysis"
authors: "Tanin Rajamand"
authorLinks: "mailto:tr44022@uga.edu"
created date: "June 10, 2024"
last update: "June 17, 2024"
dataset: "https://nextstrain.org/community/taneenak/zika"
auspiceMainTitle: "Zika Tree"
abstract: |
This narrative explores the phylogenetic analysis of the Zika virus using Nextstrain. It includes slides on the tree, map views of the virus's evolution, and entropy analysis.
---
Note the dataset argument is the link we used earlier to view our results.
You can explore all the data that is available to you in this slide. And when you click on the author’s name, you can reach them. When you are creating your narratives, you can add your GitHub, email, or any other way of communication you prefer.
We previously discussed the importance of the title section, but what information should it contain?
Here are the essential elements recommended for your title: the title of your Nextstrain and the dataset you used. The dataset can be the Nextstrain you created earlier.
While these are the core elements, you should further enhance your title with additional details:
By incorporating this information into your title, you create a clear and informative introduction to your narrative.
To create the link to your narrative, you need to follow a simple pattern: https://nextstrain.org/community/narratives/{Your_GitHub_Username}/{the_disease}/{The_specific_name_you_chose_for_your_narrative_file}
The example for taneenak dataset for Zika: https://nextstrain.org/community/narratives/taneenak/zika/6202024
After this step, you can refresh this link as you update the markdown file and see the changes as you work on your file.
In this step, click on your repository and scroll down. This is the what you should see:
Then start editing this file, change the link to the new link you got
for the narrative. For example, I changed mine to this:
Markdown/the Slides: Now that we’ve covered the title section, let’s delve into the slides - the heart of your narrative. Nextstrain offers a powerful two-panel format for each slide:
Left Panel: Text-Based Insights - This panel provides the platform for you to present your analysis and interpretations in detail. Explain your findings, highlight key points, and guide your audience through your scientific journey.
Right Panel: Visual Storytelling - This panel is your creative canvas! Here, you can embed dynamic elements like phylogenetic trees, maps, images and more. This flexibility allows you to tailor the content to resonate with any audience, from scientific experts to policymakers.
One great option is what we call “Static Content”. This content is extremely useful when you want to give background information about a certain disease to your audience. This content essentially means that instead of having visual aids, on the right side of the slide, you can have more words and images. For example, we incorporated a slide in our narratives that talks about what the virus is, its treatments, and its symptoms.
We later also added an image that shows the Zika Geographic Risk Classifications on a Map. You can use this section to add pictures of the virus you are specifically working on to show its structure, the targeted organs, and its impact on various host species. There are lots of other options available for you based on your expertise and your audience’s needs.
Below your title section, add a # to indicate a new slide.
In Markdown, you can create a linked title that stands out visually
and offers additional information through a hyperlink. To achieve this,
follow this pattern: - Start with a # symbol, indicating a level 1
heading (the main title) for your narrative. - Within square brackets
[], place the actual text you want to be displayed as the title. -
Following the title text in parentheses (), add the web address (URL)
you want to link to. When someone clicks on the title, they’ll be
directed to that webpage. Here is an example of the code in markdown:
In this step, I would like you to have the full view of the map, the
tree and the entropy all on the right side of the panel. Add the link to
the full view as follows: In here, what is in the bracket [] is
the title: Practice/ Full View The link refers to the Full View and you
can get this from the nextstrain built we already created. This was for
example the link I used:
https://nextstrain.org/community/taneenak/zika
Yours will be different based on the your GitHub Username.
In this step, I would like you to have the full view of the map only on the right side of the panel. Here we turn off the entropy and the tree and we only focus on the map.
In this step, I would like to show you how to make an animation of the tree. We filter the data based on country and we chose Brazil, Venezuela, Singapore, and the Dominican Republic as examples.
You just click on the “Date Range”
button.
Here is the code:
Here we will be using Singapore and the Dominican Republic. Turn off the entropy and turn on the tree and map. Then we click on the grid display
Here is the code:
Note that the content on the for the left panel directly follows the
# and the right panel content is between the sets of ```
with the auspiceMainDisplayMarkdown included next to the first set of
backticks.
This tells Nextstrain Narratives to format the content on the right side of the slide. While GitHub might render this section as a code block, rest assured, that Nextstrain Narratives understands the purpose and will display the text appropriately within your slide. Here is a picture of how GitHub will read this:
To make bullet points in this section, you need to add two lines in between the bullets. Otherwise, they will be printed as a straight line instead of each item at a new line.
```auspiceMainDisplayMarkdown
# First Title:
Content.
# Second Title:
- First bullet point
- Second bullet point
```
Incorporating images is another powerful way to enhance your narrative’s visual appeal. Here’s the pattern you’ll follow to include photos within the right-side panel:
<img src="...">: This indicates that you’re
inserting an image.src="image_address.png": This specifies the source of
the image file.
alt="image_name": The alt attribute provides
alternative text for the image in case the image cannot be loaded or for
visually impaired users who rely on screen readers.width="100%": This attribute controls the width of the
image.Before using this code, make sure to upload your images to the “figures” section of your GitHub repository. This is what it will look like:
The README.md and the toy_alignment_tree.png already exist in your repository since you are using a template to create yours. For our example, we uploaded a photo and named it zika.png.
Good news! You can include multiple images within a single slide’s right panel. Simply follow the pattern and everything will be set. By following these guidelines, you can effectively integrate images into your narratives, enriching the visual experience for your audience. An example of what including an image will look like in the markdown file.
Here is an example that shows the Zika narrative file I prepared. https://github.com/taneenak/zika/blob/main/narratives/zika_6202024.md Try to tell a story and have meaningful slides that are important to your audience. Show the audience the sources and the sinks of transmission, see if there was a single transmission event that was followed by more transmissions or if we had multiple introductions. Keep your virus in mind, although Zika is a disease that is transmitted by mosquito bites, it can also get transmitted through sexual activities or from a mother to baby while pregnant. This is why we can have continued transmission in a population. Another thing to keep in mind is that you need to constantly be thinking about the specific question you are trying to answer and see what type of visualization aid could help you answer that better.
When you preview your work in GitHub, you can see the titles section of the markdown is in a table format, the smaller titles are bolded in blue, the texts on the right-hand side panel are in a code block, and everything else appears as regular text.
Head over to this GitHub. There, you’ll find an example Markdown file related to Monkey Pox and Influenza narratives. Download this file as a reference. Then use this debugger tool which will parse your Markdown code to provide a preview of the corresponding narrative it would generate. This allows you to identify any errors or formatting issues that might affect your narrative. This example was given to you as a:
The link to the GitHub: https://github.com/nextstrain/narratives/blob/master/how-to-write_basics.md The link to the debugger website: https://nextstrain.org/edit/narratives This is how the debug website will look like when you uploaded that markdown file.
Everything is green and nothing is red! Good news! If something is gray, it just means that this particular aspect of the narrative is not accounted for with this link. There’s no need to worry, it doesn’t signify errors.
Now that you’ve crafted your narrative in the Markdown file on GitHub, it’s wise to verify its functionality. Download your narratives file from your GitHub Repository and check it against the same debugger website.
Look at the top right corner of your narrative, it has this button called “Explore the Data Yourself”. When you click on it, it takes you to the Nextstrain built we created earlier. You can click back on it to return to the narratives.
Creating a pathogen workflow
Create a folder for results
Enter an interactive Nextstrain shell in the current directory
Index the Sequences Precalculate the composition of the sequences (e.g., numbers of nucleotides, gaps, invalid characters, and total sequence length) prior to filtering. The resulting sequence index speeds up subsequent filter steps especially in more complex workflows.
Filter the Sequences Filter the parsed sequences and metadata to exclude strains from subsequent analysis. And subsample the remaining strains to a fixed number of samples per group.
Align the Sequences Create a multi-sequence alignment using a custom reference. Now the pathogen sequences are ready for analysis.
Construct the Phylogeny Infer a phylogenetic tree from the multi-sequence alignment.
Get a Time-Resolved Tree Augur can also adjust branch lengths in this tree to position tips by their sample date and infer the most likely time of their ancestors, using TreeTime.
Annotate the Phylogeny: Reconstruct Ancestral Traits TreeTime can also infer ancestral traits from an existing phylogenetic tree and the metadata annotating each tip of the tree.
Annotate the Phylogeny: Infer Ancestral Sequences Next, infer the ancestral sequence of each internal node and identify any nucleotide mutations on the branches leading to any node in the tree.
Annotate the Phylogeny: Identify Amino-Acid Mutations Identify amino acid mutations from the nucleotide mutations and a reference sequence with gene coordinate annotations.
Export the Results Finally, collect all node annotations and metadata and export it in Auspice’s JSON format. This refers to three config files to define colors via config/colors.tsv, latitude and longitude coordinates via config/lat_longs.tsv, as well as page title, maintainer, filters present, etc., via config/auspice_config.json. The resulting tree and metadata JSON files are the inputs to the Auspice visualization tool.